AITopics | inclusive kl

Markovian Score Climbing: Variational Inference with KL(p

Neural Information Processing SystemsDec-24-2025, 11:18:08 GMT

Modern variational inference (VI) uses stochastic gradients to avoid intractable expectations, enabling large-scale probabilistic inference in complex models. VI posits a family of approximating distributions q and then finds the member of that family that is closest to the exact posterior p. Traditionally, VI algorithms minimize the "exclusive Kullback-Leibler (KL)" KL(q||p), often for computational convenience. Recent research, however, has also focused on the "inclusive KL" KL(p||q), which has good statistical properties that makes it more appropriate for certain inference problems. This paper develops a simple algorithm for reliably minimizing the inclusive KL using stochastic gradients with vanishing bias. This method, which we call Markovian score climbing (MSC), converges to a local optimum of the inclusive KL. It does not suffer from the systematic errors inherent in existing methods, such as Reweighted Wake-Sleep and Neural Adaptive Sequential Monte Carlo, which lead to bias in their final estimates. We illustrate convergence on a toy model and demonstrate the utility of MSC on Bayesian probit regression for classification as well as a stochastic volatility model for financial data.

markovian score climbing, name change, variational inference, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.60)

Add feedback

b20706935de35bbe643733f856d9e5d6-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 21:07:25 GMT

approximation, divergence, gradient, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.68)
Asia > Middle East > Jordan (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Markovian Score Climbing: Variational Inference with KL(p

Neural Information Processing SystemsMay-27-2025, 09:23:34 GMT

Modern variational inference (VI) uses stochastic gradients to avoid intractable expectations, enabling large-scale probabilistic inference in complex models. VI posits a family of approximating distributions q and then finds the member of that family that is closest to the exact posterior p. Traditionally, VI algorithms minimize the "exclusive Kullback-Leibler (KL)" KL(q p), often for computational convenience. Recent research, however, has also focused on the "inclusive KL" KL(p q), which has good statistical properties that makes it more appropriate for certain inference problems. This paper develops a simple algorithm for reliably minimizing the inclusive KL using stochastic gradients with vanishing bias. This method, which we call Markovian score climbing (MSC), converges to a local optimum of the inclusive KL.

artificial intelligence, machine learning, markovian score climbing, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.43)

Add feedback

Inclusive KL Minimization: A Wasserstein-Fisher-Rao Gradient Flow Perspective

Zhu, Jia-Jie

arXiv.org Machine LearningOct-31-2024

Otto's (2001) Wasserstein gradient flow of the exclusive KL divergence functional provides a powerful and mathematically principled perspective for analyzing learning and inference algorithms. In contrast, algorithms for the inclusive KL inference, i.e., minimizing $ \mathrm{KL}(\pi \| \mu) $ with respect to $ \mu $ for some target $ \pi $, are rarely analyzed using tools from mathematical analysis. This paper shows that a general-purpose approximate inclusive KL inference paradigm can be constructed using the theory of gradient flows derived from PDE analysis. We uncover that several existing learning algorithms can be viewed as particular realizations of the inclusive KL inference paradigm. For example, existing sampling algorithms such as Arbel et al. (2019) and Korba et al. (2021) can be viewed in a unified manner as inclusive-KL inference with approximate gradient estimators. Finally, we provide the theoretical foundation for the Wasserstein-Fisher-Rao gradient flows for minimizing the inclusive KL divergence.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2411.00214

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Markovian Score Climbing: Variational Inference with KL(p

Neural Information Processing SystemsOct-11-2024, 03:22:52 GMT

Modern variational inference (VI) uses stochastic gradients to avoid intractable expectations, enabling large-scale probabilistic inference in complex models. VI posits a family of approximating distributions q and then finds the member of that family that is closest to the exact posterior p. Traditionally, VI algorithms minimize the "exclusive Kullback-Leibler (KL)" KL(q p), often for computational convenience. Recent research, however, has also focused on the "inclusive KL" KL(p q), which has good statistical properties that makes it more appropriate for certain inference problems. This paper develops a simple algorithm for reliably minimizing the inclusive KL using stochastic gradients with vanishing bias. This method, which we call Markovian score climbing (MSC), converges to a local optimum of the inclusive KL.

inclusive kl, markovian score climbing, variational inference, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.43)

Add feedback

Markovian Score Climbing: Variational Inference with KL(p||q)

Naesseth, Christian A., Lindsten, Fredrik, Blei, David

arXiv.org Machine LearningMar-23-2020

Modern variational inference (VI) uses stochastic gradients to avoid intractable expectations, enabling large-scale probabilistic inference in complex models. VI posits a family of approximating distributions $q$ and then finds the member of that family that is closest to the exact posterior $p$. Traditionally, VI algorithms minimize the "exclusive KL" KL$(q\|p)$, often for computational convenience. Recent research, however, has also focused on the "inclusive KL" KL$(p\|q)$, which has good statistical properties that makes it more appropriate for certain inference problems. This paper develops a simple algorithm for reliably minimizing the inclusive KL. Consider a valid MCMC method, a Markov chain whose stationary distribution is $p$. The algorithm we develop iteratively samples the chain $z[k]$, and then uses those samples to follow the score function of the variational approximation, $\nabla \log q(z[k])$ with a Robbins-Monro step-size schedule. This method, which we call Markovian score climbing (MSC), converges to a local optimum of the inclusive KL. It does not suffer from the systematic errors inherent in existing methods, such as Reweighted Wake-Sleep and Neural Adaptive Sequential Monte Carlo, which lead to bias in their final estimates. In a variant that ties the variational approximation directly to the Markov chain, MSC further provides a new algorithm that melds VI and MCMC. We illustrate convergence on a toy model and demonstrate the utility of MSC on Bayesian probit regression for classification as well as a stochastic volatility model for financial data.

approximation, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2003.10374

Country: